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ABSTRACT 

We  consider  the  problem  of  interpolating  a surface  given  its  values  at  a finite 
points.  We  place  a special  emphasis  on  the  question  of  choosing  the  location  of 
where  the  function  will  be  sampled. 


number  of 
the  points 


Using  minimal  norm  interpolation  in  reproducing  kernel  Hilbert  spaces,  equivalently 
Bayesian  interpolation,  and  N-widths.  we  provide  lower  bounds  for  interpolation  error  relative 
to  certain  error  criteria.  These  lower  bounds  can  be  used  when  evaluating  an  existing  design, 
or  when  attempting  to  obtain  a good  design  by  iterative  procedures  to  decide  whether  further 
minimization  is  worthwhile.  The  bounds  are  given  in  terms  of  the  eigenvalues  of  a relevant 
reproducing  kernel  and  the  asymptotic  behavior  of  these  eigenvalues  for  certain  tensor  product 
spaces  in  the  unit  d-dimcnsional  cube  is  obtained. 

We  demonstrate  that  for  Hm,  the  d-dimcnsional  tensor  product  of  Sobolev  spaces 
H'2m  10. 1)  and  Pyg,  the  minimal  norm  intcrpolant  to  g at  N given  data  points,  the  uniform 
convergence  ~i£n&  I I Hm  over  8 'n  l^e  unit  ball  in  H2m  cannot  proceed  at  a rate  faster 

than  ((  log  N)  /N)  Certain  conjectures  concerning  designs  converging  at  this  rale  are 
made. 


t- 
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I.  Introduction 


We  are  interested  in  the  problem  of  recovering  a surface  gU),icT,  from  observations  of  g 
at  a discrete  set  |Of  points  in  T (called  the  "design").  In  particular,  we  are 

interested  in  choosing  TN  so  that  an  estimate,  say,  gN  of  g from  the  data  U (//)},_, is  closest  to 
g in  some  appropriate  sense  among  all  designs  TN. 


This  problem  arises  in  numerous  applications.  To  cite  one  group  of  examples.  T may  be  a 
sphere  (the  surface  of  the  earth)  or  a rectangle  and  g(t)  the  500  millibar  height  or  the 
temperature,  or  the  concentration  of  some  air  pollutant  at  position  t.  The  interpolation 
problem  requires  an  estimate  of  g over  the  entire  surface  given  its  values  on  TN  while  the 
design  problem  concerns  optimal  or  nearly  optimal  choices  of  TN. 

In  this  introduction  we  shall  briefly  survey  several  different  ways  of  viewing  the  interpola- 
tion problem  i.e.  reconstructing  the  function  from  its  sample  values,  anil  then  follow  this 
discussion  with  a description  of  some  known  results  for  the  design  problem  in  one  dimension. 


In  this  discussion  of  interpolation  we  will  distinguish  between  the  Bayesian  approach  and 
the  function-analytic  or  deterministic  approach.  We  further  distinguish  the  problem  of 
estimating  g(t).  for  all  t e T from  estimating  g(t)  at  a single  point  in  T as  well  as  introduce  the 
possibility  that  our  observations  are  distorted  with  errors.  However,  this  latter  feature  is  not 
primary  for  our  objectives  here. 


% 


The  Bayesian  approach  is  as  follows:  We  suppose  g(t),  I c T,  is  a Gaussian  stochastic 
process,  or  "random  field"  with  zero  mean  and  given  strictly  positive  definite  (prior)  covari- 
ance K(s.t)  - Eg(s)g(t),  s.t  € T.  Given  the  data  g(r(),...^(/N)  the  Bayesian  estimator  for  g(t) 
is 

gpU)  - ElgO)  I g('|)..-.g('/v){ 


-i 


<*,,(/) K,sU))KJ, 


{*'  t>> 


where  K , (/)  » ) and  K#  is  the  NxN  matrix  with  i,  jth  entry  and  thus 

* 2 
ntin  E(g(r)-S  <*£(/;)) 
a i-l 

- EWi)-gNU))2. 


The  functional  analytic  approach  is  closely  related  to  the  Bayesian  approach  . Instead  of 
assuming  that  g is  a stochastic  process,  suppose  g is  a fixed  element  of  HK,  the  reproducing 
kernel  Hilherl  space  with  space  reproducing  kernel  K. 

Then  gfi  may  be  shown  to  be  the  minimal  norm  inicrpolant  to  g on  7^  in  the  Hilbert 
space  with  reproducing  kernel  K.  Observing  that  <g,K,>K  =»  g(^)  where  <v>*  is  the  inner 
product  in  H K it  can  be  verified  that  if  PNg  is  the  minimal  norm  interpolator  of  g on  TN  i c. 

I I PNg  I I * - min  | | h | | K,  ( PNg)Ut ) = gVj) 

N 

then  PNg  is  the  orthogonal  projection  of  g onto  span  f ( and  that  PNg  » gN  sec 
Kimeldorf  and  Wahba  |6].  In  particular. 


N 

m min  max  |/(f)-X  fl/(t,)  | . 

•</J>K  Si  >-•  ' ‘ 

Minimal  norm  interpolation  also  has  the  striking  property  that  it  furnishes  the  best 

estimator  for  g(t),  g e HK  among  all  estimators  (linear  or  nonlinear  in  g(/t),...£(r#))  which 

uses  the  information  g(t  N)  with  <g.g>KS  1,  that  is. 

min  max  | J(lN))  I 

A<fJ>k<\ 

where  A is  any  map  from  (/(/,)....  J{tN))  into  the  real  line  is  achieved  for 
4 (gVi).— £(*#))  — g#-  This  property  and  various  extensions  and  related  matters  in  other 
normed  spaces  is  described  in  C.A.  Micchelli  and  T.  J.  Rivlin  (13]. 

In  each  instance  above  the  data  is  viewed  is  known  exactly.  Frequently  in  applications 
only  "noisy"  data  is  available  and  this  leads  to  the  problem  of  data  smoothing.  We  briefly 
discuss  this  problem  in  Section  S. 

As  a criteria  for  choosing  we  minimize 

(1.2)  £jr(g(/)-g*(/))2d/:  - J(Tn) 

where  the  expectation  is  taken  with  respect  to  the  prior  covariance  K(s.t).  It  is  not  hard  to 
show  that 

(1.3)  J - J(Tn ) - | 

In  practice  K may  have  to  be  estimated  by  use  of  a finer  trial  grid  of  points  than  will 
ultimately  be  used,  or  from  physical  principles  governing  the  phenomena  under  study.  The 
covariance  of  air  pollution  measurements  for  example  surely  depends  on  the  local  geography. 
If  K is  known,  then,  frequently  the  minimization  of  J will  have  to  be  carried  out  numerically. 
In  this  paper  we  will  provide  a lower  bound  for  J in  terms  of  the  eigenvalues  associated  with 
the  integral  operator  induced  by  K.  Thus,  trial  solutions  for  the  design  TN  minimizing  l may 


be  compared  against  the  lower  bound  to  decide  whether  the  further  minimization  is  worth 
while. 

Theorem  I.  Lei  the  operator  K defined  by  ( Kf)(t)  -»  Jr  K(l,s)f(s)ds  he  a symmetric  compact 
operator  of  LfiT)  into  itself  and  have  eigen va lues  A ( £ A2>....  Then 

inf  J{Tn)>  Ja,-  / KU.t)dt-f,  A,. 

i-/v+ 1 JT  j.l 

It  is  not  known  whether  or  not  his  lower  bound  can  be  achieved. 

A fair  amount  is  known  about  optimum  designs  for  T»  [0,1 1,  sec  Sachs  and  Ylvisakcr 
[IS),  Wahba  (19),  Hajck  and  Kimeldorf  (3).  A sequential  procedure  for  choosing  an  optimum 
design  for  T=  [0.1]  is  given  in  Athavale  and  Wahba  [I].  The  sequential  procedure  depends 
heavily  on  properties,  of  optimal  designs  known  from  the  earlier  papers  and  docs  not  at  present 
generalize  to  T-  [0,1]  x (0,1)  or  the  sphere.  In  fact  it  appears  that  nothing  is  known  about  ’ 
best  possible  convergence  rates  in  several  dimensions  for  | \g-PNg\  I £ see  Ylvisakcr  (21). 

Sachs  and  Ylvisaker  [16]  have  shown  that  | | g-PNg  I | is  the  variance  of  the 
Gauss-Markov  estimate  of  0 in  the  model 

Y(t)  - 0g(t)  + *(/).  tcT,  EX(s)X(f)  - K(s,t) 

given  ) (/),/«  rv 

If  it  is  known  that  g is  in  some  class  C then  it  might  be  desirable  to  choose  TN  to 
* 

minimize  supl  I I Through  the  notion  of  N-widths,  introduced  by  Kolmogorov  |7), 

t«C 

and  asymptotic  estimates  of  the  eigenvalues  of  certain  integral  operators  we  will  provide  lower 
bounds  for  the  supremium  of  the  design  error  for  g in  a certain  class  C.  The  class  we  will 
consiJcr  here  is  the  natural  generalization  of  the  function  class  for  which  optimal  one  dimen- 
sional designs  were  obtained  in  [3.  IS,  19).  Before  stating  this  result  we  review  briefly  some 
results  for  optimal  experimental  design  from  [19]  for  T — [0.1]. 


; 


Theorem  3.  Lei  H ^ ■«  9d H q where  Hq  is  an  r.k.h.s.  on  (0.  f|  with  Q satisfying  (1.3).  Then 
(1.3)  « 0((  Jog  N)d~'/N)2m. 

Based  on  this  result,  we  make  some  conjectures  (Section  4)  concerning  good  designs  in 
H K using  results  from  the  multi-dimensional  quadrature  literature.  In  particular,  we  conjecture 
(1.3)  is  the  optimal  rate,  which  has  only  been  proved  for  d»l,  as  explained  above.  Finally  in 
section  3 we  make  some  observations  concerning  noisy  data. 

2.  Lower  bounds  for  optimal  designs. 

We  begin  with  the  proof  of 

Theorem  I.  Let  the  (symmetric)  operator  K defined  by  ( Kf)(t ) — }T  K(tj)f(s)ds  be  compact  with 
eigenvalues  A | > A2  £ ....  Then 

inf  J(Tn)  - inf f | | K,-PnK,  | | \dt  > £ A,. 

TS  TSJT  r-/V+l 

Proof:  The  equality  is  immediate  from  (12).  Since 

f I I K,-PnK,\  \ \dt 
JT 

- I I *,l  1 2dt-fr  I I Pnki\  I \dt 

- £a,- f|  |/>**,|  | \dt, 

tm  | JT 

, 2 N 

it  suffices  to  show  that  )r  | | PNKt  | | Kdt<l  Ar. 

Let  <>, $N  be  any  N orthonormal  functions  in  H K.  Then  the  projection  of  K,  onto  span 

N 


8 


<«l 

and 

/riiwi£- J:  iu,ni,. 

Let  K^2  be  the  symmetric  square  root  of  the  operator  K.  Then  by  the  properties  of  the 
reproducing  kernel  norm, 

i u,i  \lt  - wkx/\\ 


Now  by  the  extremal  properties  of  the  eigenvalues  of  K, 

sup  | |*,/2*| \\/\ ui  |J«  A,. 

♦««* 


Sup  | | I I g!  I I 0 I \ K m ^2* 


<♦.*1)1,-° 


where  is  the  maximizing  element  for  the  first  equality  above,  etc.  Thus, 

llUlll  LsS*r 

im  | 1 rml 


This  result  is  also  a consequence  of  a classical  result  from  the  theory  of  integral  equations 
(see  |I81,  149  ) . 

Theorem  2.  Lei  H N he  any  N dimensional  subspace  in  HK  and  PN  be  the  orthogonal 
projection  onto  H N Then  there  exists  a function  g. 

g<l)m  ITK(tj)p(s)ds, 
inch  that 

(2*1)  I Ig  — I I 


f 
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Proof:  The  proof  of  this  theorem  also  follows  directly  from  the  external  properties  for  the 
eigenvalues  of  Kx/1  and  has  an  interpretation  in  the  theory  of  N-widths.  [17].  Specifically  we 
have 

inf  sup  | \g-PH  j\  I \ 
g€C 

m inf  sup  | \Kp-PH  Kp  \ \ \ 

H"  I I W I I 

-inf  sup | \KW2p-P„  Kinp  \ |?  . 

H»  ||*llt,-l 

The  extremal  properties  of  eigenvalues  and  eigenfunctions  of  symmetric  operators  imply  that 


achieves  its  minimum  for  HN  equal  to  the  span  of  the  first  N eigenfunctions  of  K and  the 
value  of  the  minimum  is  X^,. 

To  prove  the  existence  of  an  optimal  design  we  must  find  a subspace  of  the  form  span 
for  some  design  TN  which  achieves  the  lower  bound  in  (2.1)  which  would  be 
expected  to  be  close  to  the  span  of  the  first  n eigenfunctions. 

it  should  not  be  expected  that  for  an  arbitrary  covariance  kernel  K an  optimal  design 
exists  for  each  N.  However,  for  certain  classes  of  kernels  existence  of  optimal  designs  has 
been  shown,  sec  Melkman  (10);  Mclkman  and  Micchelli  (III 

3.  Good  designs  in  Tensor  Product  Spaces 

To  make  use  of  Theorem  2 we  will  obtain  the  asymptotic  rate  of  decay  of  the  eigenvalues 

of  the  r.k.  for  HK  of  the  form 

Hk  • 9*Hq  (tensor  product  of  d copies  of  Hq) 


t 


where  II  q is  an  r.k.h.  s of  functions  on  (0. 1 1 with  eigenvalues  that  decay  as  a power, 

_2m 

X,  - cp  (1  + o(l)),r-»ao.  For  instance,  if  Q behaves  as  a Green's  function  for  a 2mth  order 
linear  differential  operator  this  condition  is  satisfied.  As  a simple  example  of  this  possibility, 
let  Hq  - abs.  cont.  /m,e/.2l0.1)/’(0)  - /*( l).r  - 0,l...,m  - 1|  with 

inner  product, 

<J*>  " ( + fVm)(u)glm)(u)du. 

Then  the  r.  k.  Q is 


Q(s.O 


1+2 


gl  vMs-l) 
(2*r)Zm 


and  the  corresponding  eigenvalues  and  eigenfunctions  are 


|lf(2wr)"2":r-  ±l,...|, 


ie 


,2  wirt. 


V - 0,±  I,}. 


Theorem  3.  Lei  HK  ■ 9dH  q where  the  eigenvalues  |Ar}  of  Hq  satisfy  A,  - r-2m(  I + o(  1)) 
then  the  eigenvalues  (f,|  of  H K satisfy 


f jv  - (--g-?- )2m(l  +o(D) 

• if 


Proof.  Since  K « 9dQ,  the  eigenvalues  of  K are  the  tensor  product  of  the  eigenvalues  of  Q 
i.e.  if  f|>f2£—  are  the  eigenvalues  of  K then 

U/vl  " I ^ iv'  J\'  M 1,2,... | 


To  estimate  the  decay  of  iN  we  observe  that  the  number  of  lattice  points  satisfying 

4 4- 1 

n jt$k » *<  k>g*r  (i  ♦ o(D). 

Hence,  since  X,  - v~2m(  I + o(l))  we  have 


1 1 


((x)  - greatest  integer  < x).  Choosing  k = [Af(  log  N)d  'j  gives  the  desired  conclusion. 


.</- ! 


2m 


tl,-<<'°6")  •■)  <1  +.<■». 


It  is  not  known  for  d>l  whether  there  exists  a design  for  which 


II*  - Pn»  \ I A<const( ( }2!LJp. )2m  C p2(u)du. 


However  designs  with  a convergence  rate  approaching  the  optimum  rate  have  been  given  in 
Wahba  (20]  for  d-2. 

Define 


Z'  - \±-:k  - I *>\ 


and 


7\.  ,=  u Z'jtoz'+'-i. 

•Vr  jm  | n 

In  1201  it  is  shown  that  TN  ^ has  N » ((  + distinct  points  and  for 

hk  - HQ*HQt 


(3.1)  II l-PT  *1  |2<consti^i-(  f p2(u)du)(\  + o(l)) 
' N.t  _</  + l)2m  JT 


< const 


((  + 2) 


2+/'±i}2m 

V / + 2 / 


N 


(— )2" 
V /+ 2 ' 


-([  p2(u)du)(  1+0(1)) 
JT 


where  g * Kp  and  o(l)-*0  as  n-~x 

Choosing  t ■ ( log  N)p(\  + o(l))  for  any  p.  0 < p < 1 we  have 
N - (t  + I )//♦*(!  + o(D) 


or  log  A I ~ p log  log  N * ( log  N)p  log  n(l  + o(l))-  Hence 

log  N-p  log  log  N 


logs  ■ (I  + o(l))- 


( log  N)P 


t 


12 

and  n -*»  provided  0<p<l  (for  p-l  this  conclusion  fails).  Setting  p » — -into  (3.1)  gives 

fft  4"  I 

^ It  — inVlu+l) 

n«-^ni-«((l-w  ^ )"■ 

t +2 
N 

2m 

a convergence  rale  which  approaches  the  optimum  rate  of  ( log  N/ N)  implied  by  Theorems 

2 and  3. 

4.  Optimal  quadrature  • a conjecture 

N 

A quadrature  formula  for  JTg{i)di  can  be  obtained  by  setting  Jy(P ^g){t)dt  m Y.Cjgiii).  Then 
\JTgMdt-JTlPNgMdt\ 

- I - Pfi/g>gl 

- I <n  - P^v.g-Pyg>gl 

<1  I V-P/il  I I a I Ig-Ptfl  I K 
< I In-f’wll  1*1  1*1  I K 
where  >j  is  the  representer  of  integration  in  H K 
t)(j)  - <V,Ks>k  - $T  K(s,u)du 

An  optimal  quadrature  problem  may  be  formulated  as:  Find  /|  ....//*  to  minimize 

I \v-PNv\\K. 

There  is  a large  literature  on  choosing  sequences  in  the  d-dimensional  unit  cube  which 

. N 

makes  the  error  for  the  special  quadrature  formula  — £*(f,)  asymptotically  small.  This 

N (ail 

work  has  focused  on  finding  sequences  TN  — for  which  the  discrepancy  DN  defined 

by 

I 

Dn  - sup  | FN(t)-F(t)  | 


JJ 


is  small.  Here  FN  is  the  cumulative  distribution  function  of  the  point  set  and  F is  the  cumula- 
tive distribution  of  the  uniform  density,  see  Kuipcrs  and  Neiderreiter  [8  ],  Halton  |4  ].  Hallon 
and  Zaremba  [5],  Zaremba  [22]. 


It  is  known  that  the  Hammersly  sequences  defined  below,  have  discrepancy 
dn-(~°6n  + o(l )) 

see  Halton  [4], 


These  sequences  are  defined  (in  d-dimensions)  by 

(/I  /V—  I 

M 

where  the  subscripts  in  the  4‘s  are  successive  primes  and  if  n mt  npt  where  M - ( log  pn\, 
Af*l 

then  ^(n)  — £ »jp~K 

1 N 

Bounds  on«#-  | fTg(t)dt- I *(/,)  | 

(V  i- 1 

in  terms  of  the  discrepancy  appear  in  the  literature,  see  Kuipcrs  and  Neiderreiter  [8,  p.  157], 
Zaremba  122]  and  references  therein. 

In  ]8|  it  is  shown  for  certain  sequences  that  c N m 0(D9N)  where  g has 

Fourrier  coefficients  c satsifying 


I c*i  I - «/ 

<n*/ 

#»i 


where 


. (Ki 

l I e(«0 


1A 


We  conjecture  that  similar  results  obtain  for  Hammcrslcy  sequences  and  that  for  spaces 

Hk  satisfying  the  hypothesis  of  Theorem  2 the  optimal  convergence  rate 

I If  - ?ngl  ijr  - V*-I<1  + *(,» 

-(Uog^) )2m(1  +o(I)) 

N 

will  hold  for  TN  the  Hammersly  sequence  and  g of  the  form  g-K  p. 


$.  Noisy  Data 


In  this  section  we  include  some  remarks  concerning  estimation  based  on  inaccurate  data. 
If  instead  of  observing  g(/),/e  TN,  we  assume  that  the  data  is  given  by 

yt  -*('/)  + £«,«/-*,/• 

then  the  minimum  norm  estimator 

min  E(g(i)~  a^)2  « | | K,-  ^ a,K,  I I \ + ®2  ^ a) 

“ I- 1 Jm  i ' /-I 


leads  in  the  functional-analytical  approach  to 
(5.1) 


min  max  | /(r)-  ^ a/Uj)  I 2 + o2  2 o? 
• <//>« SI  «-|  H i- 1 ' 


Recently,  this  variational  problem  has  been  solved  by  Laurent  19).  It  has  been  shown  that 
the  minimum  norm  estimator  is  the  smoothing  "spline"  in  H K with  parameter  o~2.  that  is.  if 
gf,  minimizes 

(5.2)  min  II  I/I  I*  + ®-2f  (/(/,)-*(', »2 1 

lm  I 

N 

then  gftU)  - I ci(t)gOi)  is  the  minimum  norm  estimator  for  g when  we  have  noisy  data. 

Note  that  the  smoothing  parameter  v1  does  not  depend  on  the  value  t < T at  which  we  choose 
to  estimate  g(t).  The  following  short  proof  of  this  result  is  instructive:  Wc  wish  to  determine 


I '■> 


t 

I 

. 

| min  | lAV-i^A',1  lj  + «2f  aj 

i-i  * /-i 

To  this  end,  we  introduce  the  tensor  product  space  HK9RN  - [(f,a)  | /e  HK m e R *}  with  the 
norm 

I I ($•<»>  I \l-  I 1*1  I*  + o2la* 

Then  the  above  problem  in  HK9RS  becomes 

N 2 
min  | \h-la  hA  | 

• Jmt  ' ' * 

* - <*, .0).  A,  - (K,rej)  (ej)k  - «yA 

But  from  the  theory  for  estimating  exactly  given  data,  as  in  (1.1),  the  minimum  a - (a, aN ) 

may  be  obtained  from  the  best  interpolani, 

* min  | | (/,<.)  | | J - 

min  111/11$;  + )-/(/, )2| 

? in  agreement  with  (S.2). 

It  has  not  yet  been  determined  if  the  optimality  of  smoothing  "splines"  persists  when  an 
| estimator  for  the  full  function  g(t),  t c T when  the  error  criteria  (1.2)  is  used.  However,  let 

us  replace  (5.1)  by 

2 

min  max  | /(/)-  £ a,(/(r  ) + e r()  | 
a </•/>«■  <.| 

! t*l 

•-I 

that  is.  we  minimize  the  worst  least  square  error  when  we  know  the  noise  in  the  data  is  in  the 

N 2 

region  yt  — /(rf)  + er(.  E r*<  I.  It  has  been  shown  that  in  this  setting  the  smoothing  spline 

»—l 

is  also  optimal.  However,  unlike  (5.1),  the  smoothing  parameter  depends  on  e as  well  as  t e 
T.  Moreover,  this  theory  holds  in  great  generality,  including,  in  particular,  estimating  the  full 
function  g(t),  t c T (see  Melkman  and  Micchclli  [I3|  for  the  details).  For  methods  of 


choosing  the  smoothing  parameter  using  a cross-validation  procedure  based  on  the  data  see 
Craven  and  Wahba  [2]. 

The  design  problem  of  Theorem  I , has  an  analogue  for  noisy  data  which  may  be  described 
as  follows.  Let  g(l),  t eT  be  a stochastic  process  as  before  and  let  gNU)  — 

EU</)  - (*,,(/> + •2/)',0'1... .yNy. 


Then  JN 
JN  . 


becomes 

- EfT{gU)-gN(i))Zdt 

- J mu)-{Kttu)...jcTllomKN  + «2/)',(/r,i(/) K,s(t))'di. 


which  may  be  compared  to  equation  ( 1 .2). 
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